Retrieval, Crawling and Fusion of Entity-centric Data on the Web

نویسنده

  • Stefan Dietze
چکیده

While the Web of (entity-centric) data has seen tremendous growth over the past years, take-up and re-use is still limited. Data vary heavily with respect to their scale, quality, coverage or dynamics, what poses challenges for tasks such as entity retrieval or search. This chapter provides an overview of approaches to deal with the increasing heterogeneity of Web data. On the one hand, recommendation, linking, profiling and retrieval can provide efficient means to enable discovery and search of entity-centric data, specifically when dealing with traditional knowledge graphs and linked data. On the other hand, embedded markup such as Microdata and RDFa has emerged a novel, Web-scale source of entitycentric knowledge. While markup has seen increasing adoption over the last few years, driven by initiatives such as schema.org, it constitutes an increasingly important source of entity-centric data on the Web, being in the same order of magnitude as the Web itself with regards to dynamics and scale. To this end, markup data lends itself as a data source for aiding tasks such as knowledge base augmentation, where data fusion techniques are required to address the inherent characteristics of markup data, such as its redundancy, heterogeneity and lack of links. Future directions are concerned with the exploitation of the complementary nature of markup data and traditional knowledge graphs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Entity Retrieval and Text Mining for Online Reputation Monitoring

Online Reputation Monitoring (ORM) is concerned with the use of computational tools to measure the reputation of entities online, such as politicians or companies. In practice, current ORM methods are constrained to the generation of data analytics reports, which aggregate statistics of popularity and sentiment on social media. We argue that this format is too restrictive as end users often lik...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Using Events for Content Appraisal and Selection in Web Archives

With the rapidly growing volume of resources on the Web, Web archiving becomes an important challenge. In addition, the notion of community memories extends traditional Web archives with related data from a variety of sources on the Social Web. Community memories take an entity-centric view to organise Web content according to the events and the entities related to them, such as persons, organi...

متن کامل

Preservation of Social Web Content based on Entity Extraction and Consolidation

With the rapidly increasing pace at which Web content is evolving, particularly social media, preserving the Web and its evolution over time becomes an important challenge. Meaningful analysis of Web content lends itself to an entity-centric view to organise Web resources according to the information objects related to them. Therefore, the crucial challenge is to extract, detect and correlate e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016